A comparative evaluation of software techniques to hide memory latency
نویسندگان
چکیده
Software oriented techniques to hide memory latency in superscalar and superpipe2ined machines include loop unrolling, software pipelining, and software cache prefetching. Issuing the data fetch request prior to actual need for data allows overlap of accessing with useful computations. Loop unrolling and software pipelining do not necessitate microarchitecture or instruction set architecture changes, whereas software controlled prefetchtng does. While studies on the benefits of the indiuidual techniques hawe been done, no study evaluates all of these techniques within a consistent framework. This paper attempts to remedy this by providing a comparative evaluation of the features and benefits of the techniques. Loop unrolling and static scheduling of loads is seen to produce significant improvement in performance at lower latencies. Software plpelining is observed to be better than software controlled prefetching at lower latencies, but at higher latencies, software prefetching outperforms software pipelining. Aggressive prefetching beyond conditional branches can detrimentally affect performance by increasing the memory bandwidth requirements and bus trafic.
منابع مشابه
An Analysis of a Combined Hardware-software Mechanism for Speculative Loads
This paper describes a simple hardware mechanism and related compiler support for software-controlled speculative loads. The compiler issues speculative load instructions based on anticipated data references and the ability of the memory system to hide memory latency in high-performance processors. The architectural support for such a mechanism is simple and minimal, yet handles faults graceful...
متن کاملData prefetching for linear algebra operations on high performance workstations
In a previous work it was shown that the performance of linear algebra computations , which access large amounts of data, is dependent on the behavior of the memory hierarchy. This research is aimed to use the multilevel orthogonal blocking approach in conjuntion with other software techniques to further improve the performance of linear algebra computations. The performance of the dense matrix...
متن کاملComparative Evaluation of Latency Tolerance Techniques for Software Distributed Shared Memory
A key challenge in achieving high performance on software DSM systems is overcoming their relatively large communication latencies. In this paper, we consider two techniques which address this problem: prefetching and multithreading. While previous studies have examined each of these techniques in isolation, this paper is the rst to evaluate both techniques using a consistent hardware platform ...
متن کاملA Survey of prefetching techniques
As the gap between processor and memory speeds increases, memory latencies have become a critical bottleneck for computer performance. To reduce the bottleneck, designers have had to create methods to hide these latencies. One popular method is prefetching. This method fetches the data from the memory system before being asked for by the processor, with the expectation that it will soon be refe...
متن کاملThe Latency Hiding Effectiveness of Decoupled Access/Execute Processors
Several studies have demonstrated that out-of-order execution processors may not be the most adequate organization for wide issue processors due to the increasing penalties that wire delays will cause in the issue logic. The main target of out-of-order execution is to hide functional unit latencies and memory latency. However, the former can be quite effectively handled at compile time and this...
متن کامل